Smoothing spline

The smoothing spline is a method of smoothing (fitting a smooth curve to a set of noisy observations) using a spline function.

Contents

Definition

Let (x_i,Y_i);x_1<x_2<\dots<x_n, i \in \mathbb{Z} be a sequence of observations, modeled by the relation Y_i = \mu(x_i). The smoothing spline estimate \hat\mu of the function \mu is defined to be the minimizer (over the class of twice differentiable functions) of[1]


\sum_{i=1}^n (Y_i - \hat\mu(x_i))^2 %2B \lambda \int_{x_1}^{x_n} \hat\mu''(x)^2 \,dx.

Remarks:

  1. \lambda \ge 0 is a smoothing parameter, controlling the trade-off between fidelity to the data and roughness of the function estimate.
  2. The integral is evaluated over the range of the x_i.
  3. As \lambda\to 0 (no smoothing), the smoothing spline converges to the interpolating spline.
  4. As \lambda\to\infty (infinite smoothing), the roughness penalty becomes paramount and the estimate converges to a linear least squares estimate.
  5. The roughness penalty based on the second derivative is the most common in modern statistics literature, although the method can easily be adapted to penalties based on other derivatives.
  6. In early literature, with equally-spaced x_i, second or third-order differences were used in the penalty, rather than derivatives.
  7. When the sum-of-squares term is replaced by a log-likelihood, the resulting estimate is termed penalized likelihood. The smoothing spline is the special case of penalized likelihood resulting from a Gaussian likelihood.

Derivation of the smoothing spline

It is useful to think of fitting a smoothing spline in two steps:

  1. First, derive the values \hat\mu(x_i);i=1,\ldots,n.
  2. From these values, derive \hat\mu(x) for all x.

Now, treat the second step first.

Given the vector \hat{m} = (\hat\mu(x_1),\ldots,\hat\mu(x_n))^T of fitted values, the sum-of-squares part of the spline criterion is fixed. It remains only to minimize \int \hat\mu''(x)^2 \, dx, and the minimizer is a natural cubic spline that interpolates the points (x_i,\hat\mu(x_i)). This interpolating spline is a linear operator, and can be written in the form


  \hat\mu(x) = \sum_{i=1}^n \hat\mu(x_i) f_i(x)

where f_i(x) are a set of spline basis functions. As a result, the roughness penalty has the form


\int \hat\mu''(x)^2 dx = \hat{m}^T A \hat{m}.

where the elements of A are \int f_i''(x) f_j''(x)dx. The basis functions, and hence the matrix A, depend on the configuration of the predictor variables x_i, but not on the responses Y_i or \hat m.

Now back the first step. The penalized sum-of-squares can be written as


\|Y - \hat m\|^2 %2B \lambda \hat{m}^T A \hat m,

where Y=(Y_1,\ldots,Y_n)^T. Minimizing over \hat m gives


\hat m = (I %2B \lambda A)^{-1} Y.

De Boor's approach

De Boor's approach exploits the same idea, of finding a balance between having a smooth curve and being close to the given data.[2]

p\sum_{i=1}^n ((Y_i - \hat\mu(x_i))/\delta_i)^2%2B(1-p)\int \hat\mu^{(m)}(x)^2 \, dx

where p is a parameter called smooth factor and belongs to the interval [0,1], and \delta_i;i=1,\dots,n are the quantities controlling the extent of smoothing (they represent the weight \delta_i^{-2} of each point Y_i). In practice, since cubic splines are mostly used, m is usually 2. The solution for m=2 was proposed by Reinsch in 1967.[3] For m=2, when p approaches 1, \hat\mu converges to the "natural" spline interpolant to the given data.[2] As p approaches 0, \hat\mu converges to a straight line (the smoothest curve). Since finding a suitable value of p is a task of trial and error, a redundant constant S was introduced for convenience.[3] S is used to numerically determine the value of p so that the function \hat\mu meets the following condition:

\sum_{i=1}^n ((Y_i - \hat\mu(x_i))/\delta_i)^2 \le S

The algorithm described by de Boor starts with p=0 and increases p until the condition is met.[2]. If \delta_i is an estimation of the standard deviation for Y_i, the constant S is recommended to be chosen in the interval [n-\sqrt{2n},n%2B\sqrt{2n}]. Having S=0 means the solution is the "natural" spline interpolant.[3] Increasing S means we obtain a smoother curve by getting farther from the given data.

Creating a multidimensional spline

Given the constraint from the definition formula X_1<X_2< \dots <X_n we can conclude that the algorithm doesn't work for any sets of data. If we plan to use this algorithm for random points in a multidimensional space we need to find a solution to give as input to the algorithm sets of data where these constraints are met. A solution for this is to introduce a parameter so that the input data would be represented as single-valued functions depending on that parameter; after this the smoothing will be performed for each function. In a bidimensional space a solution would be to parametrize x and y so that they would become x(t) and y(t) where t_1<t_2< \dots <t_n. A convenient solution for t is the cumulating distance t_{i%2B1}=t_{i}%2B\sqrt{(x_{i%2B1}-x_{i})^2%2B(y_{i%2B1}-y_{i})^2} where t_1=0.[4][5]

A more detailed analysis on parametrization is done by E.T.Y Lee.[6]

Related methods

Smoothing splines are related to, but distinct from:

Source code

Source code for spline smoothing can be found in the examples from Carl de Boor's book A Practical Guide to Splines. The examples are in Fortran programming language. The updated sources are available also on Carl de Boor's official site[1].

Further reading

References

  1. ^ Hastie, T. J.; Tibshirani, R. J. (1990). Generalized Additive Models. Chapman and Hall. ISBN 0-412-34390-8. 
  2. ^ a b c De Boor, C. (2001). A Practical Guide to Splines (Revised Edition). Springer. pp. 207–214. ISBN 0-387-90356-9. 
  3. ^ a b c Reinsch, Christian H. "Smoothing by Spline Functions". http://www.cise.ufl.edu/class/cap5416fa10/resources/Reinsch_1967.pdf. Retrieved 11 March 2011. 
  4. ^ Robert E. Smith Jr., Joseph M Price and Lona M. Howser. "A Smoothing Algorithm Using Cubic Spline Functions". http://www.pdas.com/refs/tnd7397.pdf. Retrieved 31 May 2011. 
  5. ^ N. Y. Graham. "Smoothing With Periodic Cubic Splines". http://www.alcatel-lucent.com/bstj/vol62-1983/articles/bstj62-1-101.pdf. Retrieved 31 May 2011. 
  6. ^ E.T.Y. Lee. "Choosing nodes in parametric curve interpolation". http://www.cs.bgu.ac.il/~leonid/na105/Splines/Lee.pdf. Retrieved 28 June 2011. 
  7. ^ Ruppert, David; Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge University Press. ISBN 0-521-78050-0.